Heterogeneous Data Integration with the Consensus Clustering Formalism

نویسندگان

  • Vladimir Filkov
  • Steven Skiena
چکیده

Meaningfully integrating massive multi-experimental genomic data sets is becoming critical for the understanding of gene function. We have recently proposed methodologies for integrating large numbers of microarray data sets based on consensus clustering. Our methods combine gene clusters into a unified representation, or a consensus, that is insensitive to mis-classifications in the individual experiments. Here we extend their utility to heterogeneous data sets and focus on their refinement and improvement. First of all we compare our best heuristic to the popular majority rule consensus clustering heuristic, and show that the former yields tighter consensuses. We propose a refinement to our consensus algorithm by clustering of the source-specific clusterings as a step before finding the consensus between them, thereby improving our original results and increasing their biological relevance. We demonstrate our methodology on three data sets of yeast with biologically interesting results. Finally, we show that our methodology can deal successfully with missing experimental values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Integrating Microarray Data by Consensus Clustering

With the exploding volume of microarray experiments comes increasing interest in mining repositories of such data. Meaningfully combining results from varied experiments on an equal basis is a challenging task. Here we propose a general method for integrating heterogeneous data sets based on the consensus clustering formalism. Our method analyzes source-specific clusterings and identifies a con...

متن کامل

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

A Robust Symmetric Nonnegative Matrix Factorization Framework for Clustering Multiple Heterogeneous Microbiome Data

Integration of multi-view datasets which are comprised of heterogeneous sources or different representations is challenging to understand the subtle and complex relationship in data. Such data integration methods attempt to combine efficiently the complementary information of multiple data types to construct a comprehensive view of underlying data. Nonnegative matrix factorization (NMF), an app...

متن کامل

Adaptive Distributed Consensus Control for a Class of Heterogeneous and Uncertain Nonlinear Multi-Agent Systems

This paper has been devoted to the design of a distributed consensus control for a class of uncertain nonlinear multi-agent systems in the strict-feedback form. The communication between the agents has been described by a directed graph. Radial-basis function neural networks have been used for the approximation of the uncertain and heterogeneous dynamics of the followers as well as the effect o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004